-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Optimze copy tensor with padding #32461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
820f1d4
to
31ed86d
Compare
build_jenkins |
98db555
to
f3da61f
Compare
58fd670
to
8ec08c0
Compare
Tensor layout related properties are calculated once and used those cached values during per element offset calculation. This brings ~200x improvement in wait time between two queries for PhiSlica model. That means a user has to wait only for 0.36 sec (instead of 74 sec !!!) between two queries. These numbers are from LNL. JIRA: https://jira.devtools.intel.com/browse/CVS-174810
8ec08c0
to
f056da3
Compare
build_jenkins |
fmt == bfvuwzyx); | ||
} | ||
|
||
static void get_axes_map(const format& fmt, int64_t* axes_map, size_t& map_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static std::vector<int64_t> get_internal_dims(const format& fmt) const{
to use more aligned naming and use safe container usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to use aray to make it the fastest. Even a small perf difference between vector and array adds up to some noticable difference. If I use vector here, I have to copy it to a local array in the caller.
for (int64_t y = 0; y < size.spatial[1]; y++) { | ||
for (int64_t x = 0; x < size.spatial[0]; x++) { | ||
*dst++ = static_cast<dst_t>(src[layout.get_linear_offset(cldnn::tensor(b, f, x, y, z, w))]); | ||
void convert_and_copy_padded_source(const src_t* src, dst_t* dst, layout& layout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works for plain format. So please apply it for plain format only. For other (e.g., blocked) formats, please use original method.
Tensor layout related properties are calculated once and used those
cached values during per element offset calculation. This brings ~200x improvement in wait time between two queries for PhiSlica model. That means a user has to wait only for 0.36 sec (instead of 74 sec !!!) between two queries. These numbers are from LNL.
JIRA: https://jira.devtools.intel.com/browse/CVS-174810